After successfully completing the exercises on tidy data and listening to lengthy lectures on data formats as well as specific ways of importing them, it’s now your turn to get your feet wet with importing data in the tidyverse.
As in the exercises on tidy data, we will work with several different datasets, again, including the Titanic dataset from Kaggle and data on GDP per capita from Gapminder. You can find the datasets in the respective subfolders of the data folder. However, as importing data often only requires the use of one single function we have also created some additional tasks related to importing data using tidyverse packages.
This being said, let’s start with some easy data importing.
readr library and a function that starts with read_...
You may have noticed that the function you just imports factor variables as characters by default. For some analyses, this may not be what we want (for example, if we want to use sex as predictor in a regression).
Sex to a factor.
We have already worked with the Titanic data quite a bit and will continue to do so in the sessions on data wrangling. Let’s import some other data for a change.
readxl package for this import task.
As you may have noticed, the format of the output of the two importing functions is the same (tibbles in both cases). Sometimes, however, the contents of an Excel file are not that easy to import. We will illustrate this with the help of the Unicorns on Unicycles dataset. This is what is known about this data according to its creator:
The documents were recently unearthed from a hidden chest in Delft and seem to be written by Rudolphus Hogervorstus, my great great great uncle, in 1681. These documents show that he was a scientist studying the then roaming herds of unicorns in the area around Delft. Unfortunately these animals are extinct now. His work contains multiple tables, carefully written down, documenting the population of unicorns over time in multiple places and related to that the sales and numbers of unicycles in those countries. According to Rudolphus the unicorn populations and unicycles are related “The presence of the cone on the unicorn hints at a very defined sense of equilibrium, it is therefore only natural to assume unicorns ride unicycles”. As part of the archival process these tables were copied, as Rudolphus himself would say: “with the black magic, so vile it could not be discussed for hell would come descent upon us” into satans own spawn: Microsoft Excel.
Source: https://github.com/RMHogervorst/unicorns_on_unicycles
total_turnover variable only read in the cell range A1:C43
range = range_definition (see ?read_excel).